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Abstract 

Distributed clusters like the Grid and PlanetLab 
enable the same statistical multiplexing efficiency 
gains for computing as the Internet provides for net- 
working. One major challenge is allocating resources 
in an economically efficient and low-latency way. A 
common solution is proportional share, where users 
each get resources in proportion to their pre-defined 
weight. However, this does not allow users to differ- 
entiate the value of their jobs. This leads to economic 
inefficiency. In contrast, systems that require reser- 
vations impose a high latency (typically minutes to 
hours) to acquire resources. 

We present Tycoon, a market based distributed 
resource allocation system based on proportional 
share. The key advantages of Tycoon are that it al- 
lows users to differentiate the value of their jobs, its 
resource acquisition latency is limited only by com- 
munication delays, and it imposes no manual bidding 
overhead on users. We present experimental results 
using a prototype implementation of our design. 

1 Introduction 

A key advantage of distributed systems like the Grid 
[0| and PlanetLab P is their ability to pool to- 
gether shared computational resources. This allows 
increased throughput because of statistical multi- 
plexing and the bursty utilization pattern of typi- 
cal users. Sharing nodes that are dispersed in the 
network allows lower delay because applications can 
store data close to users. Finally, sharing allows 
greater reliability because of redundancy in hosts 
and network connections. 

The key issue for shared resources is allocation. 
One solution is to add more capacity. If resources 
are already optimally allocated, then this is the only 
solution, albeit a costly one. In all other cases, allo- 
cation and additional capacity are complementary. 
In addition, in peer-to-peer systems where organi- 
zations both consume and provide resources (e.g.. 



PlanetLab), careful allocation can effectively in- 
crease capacity by providing assurances to reluctant 
organizations that contributions will be returned in 
kind. 

However, resource allocation remains a difficult 
problem. The key challenges for resource allocation 
in distributed systems are: strategic users who act 
in their own interests, a rapidly changing and un- 
predictable demand, and hundreds or thousands of 
unreliable hosts that arc physically and administra- 
tively distributed. 

Our approach is to incorporate an economic mech- 
anism ^16j (e.g., an auction) into the resource al- 
location system. Systems without such mechanisms 
HI 1^ typically assume that task values (i.e., 
their importance) are the same, or are inversely 
proportional to the resources required, or are set 
by an omniscient administrator. However, in many 
cases, task values vary significantly, are not cor- 
related to resource requirements, and are difficult 
and time-consuming for an administrator to set. 
Instead, market-based resource allocation systems 
pm irni ESI El rely on users to set the values of 
their own jobs and provide a mechanism to encour- 
age users to truthfully reveal those values. 

Despite these advantages, we are not aware of any 
currently operational market-based resource alloca- 
tion systems for computational resources. We believe 
one key impediment is that previously proposed sys- 
tems impose a significant burden on users: frequent 
interactive bidding, or, conversely, infrequent bid- 
ding that increases the latency to acquire resources. 
Most users would prefer to run their program as 
they would without a market-based system and for- 
get about it until it is done. The latency to acquire 
resources is important for applications like a web 
server that needs to allocate resources quickly in 
reaction to unexpected events (e.g., breaking news 
stories from CNN). In addition, many market-based 
systems rely on a centralized market that limits re- 
liability and scalability. 



In this paper, we present the Tycoon distributed, 
market-based resource allocation system. Each pro- 
viding Tycoon host runs an auctioneer process that 
multiplexes the local physical resources for one or 
more virtual hosts (using Linirx VServers 0). As a 
result, if an auctioneers fails, users can still acquire 
resources at other hosts. Clients request resources 
from auctioneers using continuous bids that can be 
as infrequent as the user wishes while still allowing 
immediate acquisition of resources. 

The contribution of this paper is the design, imple- 
mentation, and evaluation of Tycoon. Wc describe a 
prototype implementation of our design running on a 
22-host cluster distributed between Palo Alto in Cal- 
ifornia and Bristol in the United Kingdom. Tycoon 
can reallocate all of the hosts in this cluster in less 
than 30 seconds. We show that Tycoon encourages 
efficient usage of resources even when users make no 
explicit bids at all. We show that Tycoon provides 
these benefits with little overhead. Running a typi- 
cal task on a Tycoon host incurs a less than a 5% 
overhead compared to an identical non- Tycoon host. 
Using our current modest server infrastructure (450 
MHz x86 CPU, 100 MB/s Ethernet), limited tests 
indicate that our current design scales to 500 hosts 
and 24 simultaneous active users (or any other com- 
bination with a product of 12,000). The main limita- 
tion of this implementation is that it only manages 
CPU cycles (not memory, disk, etc.), but we expect 
to resolve this by upgrading the virtualization soft- 
ware. 

The paper is organized as follows. In §|21 we give 
an overview of the Tycoon design. In §|21 we describe 
the Tycoon architecture in detail. In § ^ we present 
the results of experiments using the Tycoon system. 
In § [31 we review related work in resource allocation. 
We describe some extensions to the basic design in 
§ IHI and conclude in § [7| 

2 Design Overview 

In this section, we present the service model and 
interface that Tycoon provides to users. We describe 
the architecture of Tycoon in more detail in § |21 

2.1 Service Model Abstraction 

The purpose of Tycoon is to allocate compute re- 
sources like CPU cycles, memory, network band- 
width, etc. to users in an economically efficient way. 
In other words, the resources are allocated to the 
users who value them the most. To give users an 
incentive to truthfully reveal how much they value 
resources, users use a limited budget of credits to bid 



for resources. The form of a bid is (h,r,b,t), where 
h is the host to bid on, r is the resource type, b is 
the number of credits to bid, and t is the time in- 
terval over which to bid. This bid says, "I'd like as 
much of r on /i as possible for t seconds of usage, for 
which I'm willing to pay 6". This is a continuous bid 
in that it is in effect until cancelled or user runs out 
of money. 

The user submits this bid to the auctioneer that 
runs on host h. This auctioneer calculates b^/tl for 
each bid i and resource r and allocates its resources 
in proportion to the bids. This is a "best-effort" al- 
location in that the allocation may change as other 
bids change, applications start and stop, etc. Credits 
are not spent at the time of the bid; the user must 
utilize the resource to burn the credits. To do this, 
a user uses ssh to run a program. The t seconds of 
usage can be used immediately or later and at the 
same time or in pieces, as the user wishes. 

Note that the auctioneers are completely indepen- 
dent and do not share information. As a result, if a 
user requires resources on two separate hosts, it is 
his responsibility to send bids to those two markets. 
Also, markets for two different resources on the same 
host are separate. 

This service model has two advantages. First, the 
continuous bid allows user agents to express more so- 
phisticated preferences because they can place differ- 
ent bids in different markets. Specific auctioneers can 
differentiate themselves in a wide variety of ways. 
For example, an auctioneer could have more of a 
resource (e.g. more CPU cycles), better quality-of- 
service (e.g., a guaranteed minimum number of CPU 
cycles), a favorable network location, etc. A user 
agent can compose bids however it sees fit to sat- 
isfy user preferences. Second, since the auctioneers 
push responsibility for expressing sophisticated bids 
onto user agents, the core infrastructure can remain 
efficient, scalable, secure, and reliable. The efficiency 
and scalability are a result of using only local infor- 
mation to manage local resources and operating over 
very simple bids. The security and reliability are a 
result of independence between different auctioneers. 

2.2 Interface 

In this section, we describe how a user uses the sys- 
tem. The interface requirments are important be- 
cause we believe the bidding requirements of previ- 
ous economic systems were burdensome for users. 

Table O lists the main Tycoon user com- 
mands. These are currently implemented as a Linux 
command-line tool, but they could easily be imple- 
mented in a graphical user interface. The first action 



Command 


Action 


tycoon create_account hostO 10 10 10 


Create an account on hostO with a bid of 10 initial credits 
for CPU cycles, memory, and disk. 


tycoon fund hostO cpu 90 1000 


Fund the account on hostO using 90 credits to be spent 
over 1000 seconds for CPU cycles. 


tycoon set_interval hostO cpu 2000 


Change bid interval on the account to 2000 for CPU cycles. 


tycoon get_status hostO 


Get status of account including the current balance, cur- 
rent interval, etc. for each of the resources. 



Table 1: This table shows the main Tycoon user commands. 



a user takes is to create an account on a providing 
host. This notifies auctioneers that a user intends to 
bid on that host and makes an initial bid. The bid 
interval defaults to 10,000,000 seconds so that the 
user is unlikely to run out of money. Account cre- 
ation only needs to be done rarely (in most cases 
once) per user and host. Users usually perform ac- 
count creation, like the operations that follow, on 
many hosts, so the command-line tool allows the 
same operation to be performed on multiple hosts 
in parallel. 

At this point, the user can ssh into hosts and run 
his application. Users are not required to change 
their bids when they start and stop tasks. They 
can do so to optimize their resource usage, if they 
wish. However, the auctioneers will still deduct cred- 
its when he runs. As a result, users who run in- 
frequently will get more resources than those who 
run continuously. If the user chooses, he can trans- 
fer more money to his account and/or change the 
bidding interval. He might have a critical task for 
which he is willing to spend credits at a higher rate, 
or, conversely, he might have a very low priority job, 
for which he wishes to decrease his spending rate. 
The key point is that the users are relieved from any 
mandatory interaction with the system. 

3 Architecture 

Tycoon is split into the following components: ser- 
vice location service (SLS), bank, auctioneer, and 
agent. The design of the SLS and bank are not novel, 
but we describe them here because they are neces- 
sary components for a working implementation. 

3.1 Service Location Service 

Auctioneers use the service location service to adver- 
tise resources, and agents use it to locate resources 
(as shown in steps 1 and 2 in Figure Our pro- 
totype uses a simple centralized soft-state server, 
but the other components would work just as well 



with more sophisticated and scalable service loca- 
tion systems (e.g.. Ganglia [HI and SWORD ^). 
Auctioneers register their status with the SLS every 
30 seconds and the SLS de-registers any auctioneer 
that has not contacted it within the past 120 sec- 
onds. This status consists of the total amount bid 
on the host for each resource, the total amount of 
each resource type available (e.g., CPU speed, mem- 
ory size, disk space), etc. The status is cryptograph- 
ically signed by the auctioneer and includes the auc- 
tioneer's public key. Clients store this key and use it 
to authenticate the results of later queries and also 
to authenticate direct communications with the auc- 
tioneer. 

The soft-state design allows the system to be ro- 
bust against many forms of hardware and software 
failures. The querying agents may receive stale infor- 
mation from the SLS, but they will receive updated 
information if they elect to contact an auctioneer 
directly. 

3.2 Bank 

The bank maintains account balances for all users 
and providers. Its main task is to transfer funds from 
a client's account to a provider's account (shown in 
step 3 in Figure P). 

We assume that the bank has a well-known pub- 
lic key and that the bank has the public keys of all 
the users. These are the same requirements for any 
user to securely use a host with or without a market- 
based resource allocation system. We further assume 
roughly synchronized clocks. In describing the trans- 
fer protocol, we use Alice and Bob as the fictional 
example sender and receiver. Alice begins by sending 
a message to the bank as follows: 

Alice, Bob, amount, time, 

SignAiice{Alice, Bob, amount, time) 

SiguAHce is the DSA signature function using Alice's 
private key. The bank verifies that the signature is 
correct, which implies that the message is from Alice, 




that the funds are for Bob, and that the amount 
and time are as specified. The bank keeps a hst of 
recent messages and verifies that this message is new, 
thus guarding against replay attacks. Assuming this 
is all correct and the funds are available, the bank 
transfers amount from Alice to Bob and responds 
with the following message (the receipt): 

Alice, Bob, amount, time. 

Sign Bank (Alice, Bob, amount, time) 

The bank sends the same time as in the first mes- 
sage. Alice verifies that the amount, time, and recip- 
ient are the same as the original message and that 
the signature is correct. Assuming the verification 
is successful, Alice forwards this message to Bob as 
described in § 13.31 Bob keeps a list of recent receipts 
and verifies that this receipt is new, thus guarding 
against replay attacks. 

The advantages of this scheme are simplicity, ef- 
ficiency, and prevention of counterfeiting. Micro- 
currency systems are generally complex, have high 
overhead, and only discourage counterfeiting. The 
disadvantages of this approach are scalability and 
vulnerability to compromise of the bank. However, 
bank operations are relatively infrequent (see § 13.3.21 
for how bids can be changed without involving the 
bank) , so scalability is not a critical issue for moder- 
ate numbers of users and hosts, as we show in S 14.41 
The vulnerability to compromise of the bank could 
be a problem and we discuss possible solutions in 

m 

3.3 Auctioneer 

Auctioneers serve four main purposes: management 
of local resources, collection of bids from users, al- 
location of resources to users according to their 



bids, and advertisment of the availability of local 
resources. 

3.3.1 Virtualization 

To manage resources, an auctioneer relies on a virtu- 
alization system and a local allocation system. Our 
implementation uses Linux VServer (with modifi- 
cations from PlanetLab) for virtualization. VServer 
provides each user with a separate file system and 
gives the appearance that he is the sole user of a ma- 
chine, even if the physical hardware is being shared. 
The user accesses this virtual machine by using ssh. 

VServers virtualize at the system call level, which 
provides the advantage of low overhead. We show in 
§ 14.31 that the total auctioneer overhead, including 
VServers, is at most ten percent and usually much 
less. Systems that virtualize at the hardware level 
like VMWare |3] or Disco have significantly more 
overhead [T^. 

For local allocation. Tycoon uses the plkmod pro- 
portional share scheduler 6 , which implements the 
standard proportional share scheduling abstraction 
The disadvantage of VServers and plkmod is 
that they do not completely virtualize system re- 
sources. This is why Tycoon currently only manages 
CPU cycles. In §|niwe discuss new virtualization and 
allocation systems that provide this functionality. 

3.3.2 Setting Bids 

The second purpose of auctioneers is to collect bids 
from users. Auctioneers store bids as two parts for 
each user: the local account balance, and the bidding 
interval. The local balance is the amount of money 
the user has remaining locally. The bidding interval 
specifies the number of seconds over which to spend 
the local balance. Users have two methods of chang- 
ing this information: fund and set_interval. fund 
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transfers money from the user's bank account to the 
auctioneer's bank account, and conveys that fact to 
the auctioneer. It has the disadvantage that it re- 
quires significant latency (100 nis) and it requires 
communication with the bank, which may be offline 
or overloaded. set_interval sets the bidding inter- 
val at the auctioneer without changing the local bal- 
ance. It only requires direct communication between 
the client and the auctioneer, so it provides a low 
latency method of adjusting the bid until the local 
balance is exhausted. 

In describing the fund protocol, we again use Alice 
and Bob as examples. We assume that Alice and Bob 
already have each other's public keys and that Alice 
has the value nonce AUce- A nonce is a unique token 
which Bob has never seen from Alice before. In the 
current implementation it is an increasing counter. 
First, Alice gets a bank receipt as described above. 
She then sends the following message to Bob: 

Alice, Bob, nonccAUcei interval, receipt, 

SignAiice{Alice, Bob, nonce AUce, interval, receipt) 

The nonce allows Bob to detect replay attacks. Bob 
verifies that he is the recipient of this message, that 
the nonce has not been used before, that the receipt 
specifies that Alice has transferred money into his 
account, that the bank has correctly signed the re- 
ceipt, and that Alice has correctly signed this mes- 
sage. Assuming this is all correct. Bob increases Al- 
ice's local balance by the amount specified in the 
receipt and sets Alice's bidding interval to interval. 
set_interval is identical, except that it does not 
include the bank receipt. 

The key advantage of separating fund and 
set_interval is that it reduces the frequency of 
bank operations. Users only have to fund their hosts 
when they wish to change the set of hosts they are 
running on or when they receive income. For most 
users and applications, we believe this is on the or- 
der of days, not seconds. Between fundings, users can 
modify their bids by changing the bidding interval, 
as described in the next section. 

3.3.3 Allocating Resources 

The third and most important purpose of auction- 
eers is to use virtualization and the users' bids to 
allocate resources among the users and account for 
usage. Although our current implementation only al- 
locates CPU cycles because of virtualization limita- 
tions, the following applies to both rate-based (e.g., 
CPU cycles and network bandwidth) and space- 
based (e.g., physical memory and disk space) re- 



sources. In addition, we initially describe a propor- 
tional share-based function, but there are other allo- 
cation functions with desirable properties (e.g.. Gen- 
eralized Vickrey Auctions, described below). 

For each user i, the auctioneer knows the local bal- 
ance bi and the bidding interval t^. The auctioneer 
calculates the bid as bi/ti. Consider a resource with 
total size R (e.g., the number of cycles per second 
of the CPU or the total disk space) over some pe- 
riod P. The allocation function for r^, the amount 
of resource allocated to user i over P, is 

r, = , R. 

* Y^n — 1 hj 

Let qi be the amount of the resource that i actually 
consumes during P, then the amount that i pays per 
second is 

Si = nun — , 1 — . 

\ri J ti 

This allows users who do not use their full allocation 
to pay less than their bid, but in no case will a user 
pay more than his bid. 

There are a variety of implementation details. 
First, the auctioneer gets the number of cycles used 
by each user from the kernel to determine if qt < ri. 
Second, we set P = 10s, so the auctioneer charges 
users and recomputes their bids every 10 seconds. 
This value is a compromise between the overhead 
of running the auctioneer and the latency in chang- 
ing the auctioneer's allocation. With tighter integra- 
tion with the kernel and the virtualization system, 
P could be as small as the scheduling interval (lOms 
on most systems). Third, users whose bids are too 
small relative to the other users are logged off the 
system. Users who bid for less than .1% of the re- 
source would run infrequently while still consuming 
overhead for context-switching, accounting, etc., so 
the auctioneer logs them off, starting with the small- 
est bidder. 

The advantages of this allocation function H3.3.3(l 
are that it is simple, it can be computed in 0{n) 
time, where n is the number of bidders, it is fair, 
and it can be optimized across multiple auctioneers 
by an agent (described in S IH.4|I . It is fair in the sense 
that all users who use their entire allocation pay the 
same per unit of the resource. 

The disadvantage is that it is not strategyproof. In 
the simple case of one user running on a host, that 
user's best (or dominant) strategy is to make the 
smallest possible bid, which would still provide the 
entire host's resources. If there are multiple users, 
then the user's dominant strategy is to bid his val- 
uation. Since, the user's dominant strategy depends 



on the actions of others, this mechanism is not strat- 
egyproof. One possible strategyproof mechanism is a 
Generahzed Vickrey Auction (GVA) However, 
this requires 0(n^) time, it is not fair in the sense 
described above, and it is not clear how to optimize 
bidding across multiple GVA auctioneers. 

3.3.4 Advertising Availability 

The auctioneer must advertise the availability of lo- 
cal resources so that user agents can decide whether 
to place bids. For each resource available on the lo- 
cal host, the auctioneer advertises the total amount 
available, and the total amount spent at the last al- 
location. In other words, the auctioneer reports 

n-l 
3=0 

This may be less than the sum of the bids because 
some tasks did not use their entire allocation. We 
report this instead of the sum of the bids because it 
allows the agent to more accurately predict the cost 
of resources (as required the algorithm described in 
§ 13.4.11) . Note that this information allows agents to 
make appropriate bids without revealing the exact 
amounts of other users' individual bids. Revealing 
that information would allow users to know each 
other's valuations, which would allow gaming the 
auctions. 

3.4 Agent 

The role of a tycoon agent is to interpret a user's 
preferences, examine the state of the system, make 
bids appropriately, and verify that the resources 
were provided. The agent is involved in steps 2, 3, 
4, and 6 of Figure Q Given the diversity of possible 
preferences, we chose to separate agents from the in- 
frastructure to allow agents to evolve independently. 
This is a similar approach to the end-to-end prin- 
ciple used in the design of the Internet [3 ^2 > 
where application-specific functionality is contained 
in the end-points instead of in the infrastructure. 
This allows the infrastructure to be efficient, while 
supporting a wide variety of applications. 

There are a wide variety of preferences that a user 
can specify to his agent. Tycoon provides for both 
high-level preferences that an agent interprets and 
low-level preferences that users must specify in de- 
tail. Examples of high level preferences are want- 
ing to maximize the expected number of CPU cy- 
cles or to seek machines with a minimum amount of 
memory, or some combination of those preferences. 



Tycoon allows uncertainty in the exact amount of 
resource received because other applications on the 
same host may not use their allocation and/or other 
users may change their bids. 

3.4.1 Best Response Algorithm 

In a system with many machines, it is very difficult 
for users to bid on individual machines to maximize 
their utilization of the system. In Tycoon, we allow 
the user to only specify the total bids, or the budget, 
he is willing to spend and let the agent compute the 
bids on the machines to maximize the user's utility. 
In order to compute the optimum bids, the agent 
must first know the user's utility as a function of 
the fraction of the machines assigned to the user. 
Since it is difficult, if not impossible, to figure out 
the exact formulation of the utility function, we as- 
sume a linear utility function for each user. That is, 
each user specifies a non-negative weight for each 
machine to express his preference of the machine. 
Such a weight is chosen by the user and determined 
mainly by two factors: the system configuration and 
the user's need. They may vary from user to user. 
For example, one user may have higher weight on 
machine A because it has more memory, and an- 
other user may have higher weight on B because it 
has a faster CPU. The weights are kept private to 
the users. 

Now, suppose that there are n machines, and a 
user has weight Wi on machine i for 1 < j < n. If the 
user gets fraction from machine i, then his utility 
is 

n 

[/ = ^ WiTi . 
1=1 

The agent's goal is to maximize the user's utility 
under a given budget, say X , and the others' aggre- 
gated bids on the machines. Suppose that yi is the 
total bid by other users on machine i. The user's 
share on i is then — if he bids x, on machine 
i. Therefore, the agent needs to solve the following 
optimization problem: 

n 

maximize y Wi — , s.t. 

i=l Xi + V'. 

Xi > , for 1 < i < ri, and 

n 

This optimization problem can be solved by using 
the following algorithm. 



1. sort ^ in decreasing order, and suppose that 

> > ^" 
Vi ~ y2 ~ ~ Vn ' 

2. compute the largest k such that 

3. set Xi = Q for i > k, and for 1 < i < fc, set 

The above algorithm takes 0{n log n) time as sort- 
ing is the most expensive step. It is derived by using 
Lagrangian multiplier method. Intuitively, the opti- 
mum is achieved by the bids where the bid on each 
machine has the same marginal value. The challenge 
is to select the machines to bid on. Roughly speak- 
ing, one should prefer to bid on a machine if it has 
high weight on the machine and if other's bids on 
that machine is low. That is the intuition behind 
the first sorting step. We omit the correctness proof 
of the algorithm due to the space limitation. 

One problem with the above algorithm is that it 
spends the entire budget. In the situation when there 
are already heavy bids on the machines, it might 
be wise to save the money for later use. To deal 
with the problem, a variation is to also prescribe a 
threshold A to the agent and require that the margin 
on each machine is not lower than A, in addition to 
the budget constraint. Such problem can be solved 
by an easy adaptation of the algorithm. 

3.4.2 Predictability 

Instead of maximizing its expected value, some 
applications may prefer to maintain a minimum 
amount of a resource. An example of this is mem- 
ory, where an application will swap pages to disk 
if it has less physical memory than some minimum, 
but few applications benefit significantly from hav- 
ing more than that. Tycoon allows agents to express 
this preference by putting larger bids on fewer ma- 
chines. Let R be the total resource size on a host 
and B be the sum of the users' bids for the resource, 
excluding user i. From H3.3.3|l . the user j's agent can 
compute that to get of a resource, it should bid 



However, this only provides an expected amount 
of Vi. To provide higher assurances of having this 
amount, the agent bids more than 6^. To determine 
how much more, the agent maintains a history of the 
bids at that host to determine the likelihood that a 
particular bid will result in obtaining the required 
amount of a resource. Assuming that the application 
only uses of the resource, the user will pay more 
per unit of the resource than if his agent had just 
bid bi (see § 13.3. 3|l . but that is the price of having 
more predictability. 

3.4.3 Scalability 

Since the computational overhead of the agent is low, 
the main scalability concern is communications over- 
head. When making bids, a user agent may have to 
contact a large number of auctioneers, possibly re- 
sulting in a large queueing delay. For example, to 
use 100 hosts, the agent must send 100 messages. 
Although the delay to do this is proportional to the 
amount of resources the user is using, for very large 
numbers of hosts and a slow and/or poorly con- 
nected agent host, the delay may be excessive. In 
this case, the agent can use an application-layer mul- 
ticast service (e.g.. Bullet ^7]) to reduce the delay. 
Since changing a bid consists of simply setting an in- 
terval, the user agent can use a multicast service to 
send out the same interval to multiple auctioneers. 
This would essentially make the communication de- 
lays logarithmic with respect to number of hosts. 

3.4.4 Verification 

One potential problem with all auction-based sys- 
tems is that auctioneers may cheat by charging 
more for resources than the rules of the auction dic- 
tate. However, one advantage of Tycoon is that it 
is market-based so users will eventually find more 
cost-effective auctioneers. Cost-effectiveness is an 
application-specific metric. For example, an appli- 
cation may prefer a slow host because it has a favor- 
able network location. Users who are interested in 
CPU cycles would view that as a host with poor 
cost-effectiveness. However, in many applications, 
the agent can measure cost-effectiveness fairly ac- 
curately. As an example, the rendering application 
we use in §0]uses frames rendered per second as its 
utility metric. As a result, the cost-effectiveness is 
frames rendered per second per credit spent for each 
host. 

The measured cost-effectiveness is then used as 
the host weight for the best-response algorithm. This 
algorithm will automatically drop a host from bid- 
ding when it sees that it is significantly less cost- 



effective than the others. Effectively, Tycoon treats 
a cheating host as a host with poor cost-effectiveness. 
Therefore we do need sophisticated techniques to de- 
tect or prevent cheating. If no agents wants to spend 
credits at a cheating auctioneer, the monetary incen- 
tive to cheat is greatly reduced. 

3.5 Funding Policy 

Funding policy determines how users obtain funds. 
We define open loop and closed loop funding poli- 
cies. In an open loop funding policy, users are funded 
at some regular rate. The system administrators 
set their income rate based on exogenously deter- 
mined priorities. Providers accumulate funds and re- 
turn them to the system administrators. In a closed 
loop (or peer-to-peer) funding policy, users them- 
selves bring resources to the system when they join. 
They receive an initial allotment of funds, but they 
do not receive funding grants after joining. Instead, 
they must earn funds by enticing other users to pay 
for their resources. A closed loop funding policy is 
preferable because it encourages service providers to 
provide desirable resources and therefore should re- 
sult in higher economic efficiency. 

4 Experiments 

4.1 Experimental Setup 

The experiments in this section were run the on the 
hosts shown in Table l^TI A were running Linux with 
the PlanetLab 2.4.22 kernel, which includes VServer 
and plkmod. 

4.2 Agility 

In this section, we report the results of experiments 
to test agility, the ability to adapt to changes in de- 
mand. As a workload, we used the Maya 6.0 image 
rendering software to renderer frames in a movie 
scene. The jobs were dispatched using the Muster 
job queue, an off-the-shelf product that manages dis- 
tributed rendering jobs. During the experiment two 
users were rendering concurrently on each node. 

First, we examine the time for a user to acquire 
more resources to finish his rendering job sooner. In 
Figure|21 a user has initialized his nodes with $10 to 
be spent over 30,000 seconds. He submits a 200 frame 
rendering job to the Tycoon cluster. Someone else is 
already running on the cluster. Using the bids of 
both users, auctioneers allocate the new user about 
twenty percent of each node. After running for three 
minutes, the user notices that the job is not likely 
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Figure 2: This figure shows a user increasing his share 
at 190 seconds by decreasing the bidding interval. As a 
result, the throughput increases by 210 seconds. 

to finish early enough, so he changes the spending 
interval to 300 seconds on all nodes. This will leave 
him with fewer credits at the end of the run than if 
he left the interval at 30,000, but it is worth it to him. 
The time at when he changes the interval is marked 
by the left vertical line. About twenty seconds later, 
the right vertical line marks the time at which the 
user is able to detect an increased rate of rendering. 
Afterward, the frames finish at an increased speed, 
and the job finishes on time. 

This demonstrates the system's ability to quickly 
reallocate resources. As in this case, this could be 
because a user cannot accurately estimate the re- 
source requirements of his application. Other possi- 
ble causes are that hosts have failed, the load has 
increased, the user's deadline has changed, etc. The 
agility of the system allows users to compensate for 
uncertainty. 

In a second experiment, we examine the system's 
ability to change allocations when a high priority 
job is started. In this scenario, two users are render- 
ing on the cluster. One user performs a low priority 
render, and he funds his nodes with $10 for 100,000 
seconds. A second user funds his nodes with $10 for 
10,000 seconds. Initially, only the low priority job 
is running, but after 220 seconds, the second user 
submits a rendering job to the system. 

Figure 01 shows the average rate at which frames 
are finished for the two jobs. First the low priority 
job runs alone, at an average rate of 1.1 frames per 
second. When second users submits the high priority 
job, (marked with a vertical line in the figure), the 
throughput of the low priority job decreases almost 
immediately to 0.2 frames per second, and the high 
priority job starts to render at 0.9 frames per second. 
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Processor Variety 


CPU 


Memory 


Disk 


# nodes 


Location 


Pentium III 


1 GHz 


2 GB 


32 GB SCSI 


4 


US 


Mobile Pentium III 


900 MHz 


512 MB 


40 GB IDE 


8 


UK 


Pentium III 


550 MHz 


256 MB 


10 GB IDE 


2 


US 


Pentium II 


450 MHz 


128 MB 


10 GB IDE 


6 


UK 



Table 2: Specifications of the four types of computers used in the test cluster. 
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Figure 3: This figure shows a low priority job with a 
small share getting lower throughput when a high prior- 
ity job arrives. 



Figure 4: This figure shows how a user that runs infre- 
quently can receive more resources when he does run in 
comparison to a user that runs continuously. 



As soon as the high priority job finishes, the low 
priority job starts to utilize the CPUs again, and 
gets an increased throughput. 

When high priority job first starts, it has lower 
throughput because it is waiting for disk I/O. During 
that time, the low priority job is able to continue to 
utilize the CPU. As soon as the high priority job 
is ready to run, it produces frames at almost full 
speed. Based on the bids, its share is 90%. The actual 
throughput is on average 0.9/1.1, which is slightly 
lower. This is also because of disk I/O delays. The 
throughput penalty from I/O is higher for the high 
priority task than for the low priority task because 
it issues more I/O operations. This is an artifact of 
our version of VServer being unable to regulate disk 
I/O bandwidth. If our virtualization layer had that 
capability, the actual throughput would be closer to 
the ideal of 90%. 

In the third experiment, we show how the system 
treats a user who runs infrequently in comparison to 
one that runs continously. Both users initialize their 
nodes with $10 for 300 seconds. One user starts a 
long continuous job on the cluster. While the user is 
running alone, his share decreases in proportion to 
(1 — P/tiY^^ where r is the time since the start of 
the experiment, ti = 300 is the funding interval, and 



P = 10 is the auctioneers' update interval. Since the 
infrequent user is not running, the continuous job 
initially gets to use the whole cluster, as shown in 
Figure^ 

After 400 seconds, the infrequent user starts run- 
ning. Since it has not spent any money, it's share is 
75 percent, and the job that has been running has a 
25 percent share. Since both jobs continue to pay in 
proportion to their balance, their shares remain at 75 
and 25 percent, respectively, until the infrequnt users 
stops running. The infrequent user returns at 1300 
seconds and again he gets most of the resources. In 
this case, he gets most of the resources because the 
continuous user's share has dropped considerably. 

The key point about this result is that the system 
encourages efficient usage of resources even when 
users do not make explicit bids. In this experiment, 
the users bids were identical, which could have been 
set when their accounts were created. Despite this, 
the infrequent user is rewarded for being judicious in 
his resource consumption, while the continuous user 
is penalized for running all the time. In comparison, 
a proportional share system would allocate 50% of 
the resources to each user when both are running. 
This gives no disincentive for the continuous user 
to stop running. The performance improvement for 
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Figure 5: System Call Performance 
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Figure 6: CPU-bound Task Performance 
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Figure 7: Disk Read Performance 



the infrequent user is (.75 — .50)/. 50 = 50% for one 
continuous user. For n continuous users, the perfor- 
mance improvement is (.75 — (l/n))/(l/ri), which 
goes to infinity as n goes to infinity. 

4.3 Host Overhead 

This set of experiments measures the overhead in- 
curred by using Tycoon rather than using the same 
Linux computer without Tycoon. This overhead in- 
cludes VServer, plkmod, and the auctioneer over- 
head. We compared this relative performance for 
three distinct types of operations. They are illus- 
trated in Figures [S] El [3 and El for system call over- 
head, CPU-bound computation, disk reading and 
disk writing, respectively. In these experiments, from 
one to eight programs designed to test a particular 
type of operation are invoked simultaneously by ssh. 
For the Tycoon experiments, each program is started 
as a distinct user. In the root scenario, the programs 
are all run as root. The sum of the scores of all of 
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Figure 8: Disk Write Performance 
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Operation 


SLS 


Bank 


Aug. 


Agent 


Registration (per min.) 


260 




9634 




SLS query (20 hosts) 


89K 






311 


Bank transfer 




1198 




610 


Account creation 






5901 


3592 


Spending rate change 






793 


719 



Table 3: Bytes sent from the specified entity while 
conducting the specified operation. 

the concurrent processes is plotted as a function of 
the number of concurrent processes. 

For CPU-bound processes and for a few I/O- 
bound processes, Tycoon has less than five percent 
overhead. We expect the bulk of cluster applications 
to be similar to these micro-benchmarks. For pro- 
cesses that involve many system calls, the overhead 
is capped at ten percent, but we do not expect many 
Tycoon processes to be system call-heavy. The over- 
head for Tycoon is most significant for many disk 
reading processes. This may be due to the additional 
memory overhead of VServer reducing the size of the 
buffer cache, but we are still investigating this. 

4.4 Network Overhead 

The primary bottlenecks that prevent the unfettered 
scaling of a Tycoon cluster are the two centralized 
servers, the service location server (SLS) and the 
bank. Table |31 quantifies the costs of performing the 
most common operations on a Tycoon cluster. 

The most frequent process is the maintenance of 
soft-state between the auctioneers and the SLS. As- 
suming that the SLS is allowed to use lOOMb/s net- 
work bandwidth (e.g., it is on a IGb/s network), it 
can manage up to 75,000 Tycoon hosts. If clients use 
the best response agent to operate on the Tycoon 
cluster, they must issue repeated host-list queries to 
the SLS to compute their optimal bidding strategy. If 
the agent updates its strategy once a minute, it costs 
roughly 4KB/minute per agent per host. Again as- 
suming this task is allocated lOOMb/s of bandwidth, 
the product of the number of agents and number of 
hosts must not exceed 187M. Hence assuming that 
there are 75K hosts in the cluster, there may be up to 
2500 agents running concurrently. Similarly if there 
are only 2500 hosts, there may be up to 75K agents. 

A less frequent operation is bank transfers from 
users to hosts. This task depends less on bandwidth 
and more on the speed of the bank system in per- 
forming large integer arithmetic for authentication. 
On a 450 MHz Pentium III, this operation requires 
an average of 100ms. Assuming user perform bank 
operations every twenty minutes per user per host. 



this bank supports an active user-host product of 
12,000, which would allow 24 simultaneous active 
users on a 500 host cluster. As a result, for the im- 
mediate future, we do not believe a centralized bank 
is a significant problem. One reason is that much 
faster hardware is available. A 3 GHz bank should 
support 6.7 times the number of users or hosts or 
combination thereof. Another reason is that the cur- 
rent protocol performs only one credit transfer per 
connection. It could be optimized to perform mul- 
tiple transfers per connection which would amor- 
tize the authentication and communication costs. Fi- 
nally, twenty minutes is a very conservative estimate 
of bank operations. A more likely frequency is once 
a day. This would allow even the current slow hard- 
ware and unoptimized protocol to support a user- 
host product of 864000. A centralized bank is not 
likely to limit scalability in practice. 

5 Related Work 

In this section, we describe related work in resource 
allocation. There are two main groups: those that in- 
corporate an economic mechanism^, and those that 
do not. 

One of the key non-economic abstractions for re- 
source allocation is a computer science context is 
Proportional Share (PS), originally documented by 
Tijdeman Each PS process i has a weight Wi. 
The share of a resource that process i receives over 
some interval t where n processes are running is 



(1) 



PS maximizes utilization because it always provides 
resources to needy processes. One problem is that 
PS is usually applied by giving each user a weight 
and directly transferring that weight to the user's 
processes. However, a user may not weigh all of his 
processes equally and PS does not give an incentive 
for users to differentiate his processes. As a result, 
as a system becomes more loaded, the low value pro- 
cesses consume more resources, until the high value 
processes cannot make useful progress (as shown by 
Lai, et al. [TF|). 

One common method for dealing with this prob- 
lem is to rely on social mechanisms to set the PS 
weights appropriately. A system administrator could 
set them based on input from users or users could 



^By mechanism we mean the system that provides an 
incentive for users to reveal the truth (e.g., an auction) 
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"horse trade" high weights amongst themselves. Al- 
though these mechanisms work well for small groups 
of people that trust each other, they do not scale to 
larger groups and they have a high overhead in user 
time. 

Most recent work by Waldspurger and Weihl [221, 
Stoica, et al. and Nieh, et al. ^ on PS has 
focused on computationally efficient and fair imple- 
mentations. Lottery scheduling '32 is a PS-based ab- 
straction that is similar to the economic approach in 
that processes are issued tickets that represent their 
allocations. Sullivan and Seltzer extend this to 
allow processes to barter these tickets. Although this 
work provides the software infrastructure for an eco- 
nomic mechanism it does not provide the mechanism 
itself. 

Similarly, SHARP (described by Fu, et al. HH) 
provides the distributed infrastructure to manage 
tickets, but not the mechanism or agent strategies. 
In addition, SHARP and work by Urgaonkar, et al. 

use an overbooking resource abstraction instead 
of PS. An overbooking system promises probabilis- 
tic resources to applications. Tycoon uses a similar 
abstraction for applications that require a minimum 
amount of a resource. 

Another class of non-economic algorithms exam- 
ine resource allocation from a scheduling (surveyed 
by Pindedo 123) perspective using combinatorial op- 
timization (described by Papadimitriou and Steiglitz 
[221) or by examining the resource consumption of 
tasks (a recent example is work by Wierman and 
Harchol-Balter ^3). However, these assume that the 
values and resource consumption of tasks are re- 
ported accurately. This assumption does not apply 
in the presense of strategic users. We view scheduling 
and resource allocation as two separate functions. 
Resource allocation divides a resource among differ- 
ent users while scheduling takes a given allocation 
and orders a user's jobs. 

Examples the economic approach are Spawn (by 
Waldspurger, et al. [2]), work by Stoica, et al. [2S|-, 
the Millennium resource allocator (by Chun, et al. 
[To ). work by Wellman, et al. [33|, and Bellagio (by 
Au Young, et al. 0). 

Spawn and the work by Wellman, et al. uses a 
reservation abstraction similar to the way airline 
seats are allocated. Although reservations allow low 
risk, the utilization is also low because some tasks do 
not use their entire reservations. Service applications 
(e.g., web serving, database serving, and overlay 
network routing) result in particularly low utiliza- 
tion because they typically have bursty and unpre- 
dictable loads. Another problem with reservations is 
that they can significantly increase the latency to ac- 



quire resources. A reservation by one user prevents 
another user from using the resources for the dura- 
tion of the reservation, even if the new user is willing 
to pay much more for the resources than the first 
user. Reservations are typically on the order of min- 
utes or hours (Spawn used 15 minutes), which is too 
much delay for a highly bursty and unpredictable 
application like web serving. 

The proportional share abstraction used in the 
Millennium resource allocator comes the closest to 
that used in Tycoon. We extend that abstraction 
with continuous bids, the best-response agent algo- 
rithm, and secure protocols for bidding. 

Bellagio uses a centralized allocator called SHARE 
developed by Chun, et al. [Sj. SHARE takes the 
combinatorial auction approach to resource alloca- 
tion. This allows users to express preferences with 
complementarities like wanting host A and host B, 
but not wanting host A without B or B without A. 
The combinatorial auction approach relies on a cen- 
tralized auctioneer to guarantee that the user either 
gets both A and B or else nothing. Economic the- 
ory predicts that solving this NP-complete problem 
provides an allocation with optimal economic efh- 
cicncy. Tycoon addresses the combinatorial problem 
in a possibly less economically efficient, but more 
scalable way. In Tycoon, credits are only spent when 
the user actually consumes resources, so the user's 
agent can see that it only has A before his applica- 
tion runs and thereby prevent wasting credits on an 
unvalued resource. The disadvantages of the combi- 
natorial auction approach are the centralized auc- 
tioneer and the difficulty of the combinatorial auc- 
tion problem. The centralized auctioneer is vulner- 
able to compromise and limits the scalability of the 
system, especially since it must be involved in all al- 
locations. Moreover, even computationally efficient 
heuristic algorithms operate on the order of minutes, 
while Tycoon reallocates in less than ten seconds. 
Recent work by Hajiaghayi jl5| on online resource 
allocation may be able to reduce the delay of the 
combinatorial approach. 

6 Future Work 

One area of future work is more complete virtualiza- 
tion. Our prototype implementation uses early ver- 
sions of VServer and plkmod which only support vir- 
tualization of CPU cycles. Later versions of VServer, 
Xen 221 J and the Class-based Kernel Resource Man- 
agement (CKRM) ^ support more complete virtu- 
alization and should be relatively straight-forward 
to integrate with Tycoon. 

Another area of future work is to develop a seal- 



able banking infrastructure. One possibility is to 
physically distribute the bank without administra- 
tively distributing it. The bank would consist of sev- 
eral servers with independent account databases. A 
user has accounts on some subset of the servers. A 
user's balance is split into separate balances on each 
server. To make a transfer, users find a server where 
both the payer and payee have an account and that 
contains enough funds. The transfer proceeds as with 
a centralized bank. Users should periodically redis- 
tribute their funds among the servers to ensure that 
one server failure will not prevent all payment. 

7 Summary 

An economic mechanism is vital for large-scale re- 
source allocation. In this paper, we propose a dis- 
tributed market where auctioneers only manage lo- 
cal resources. A user's agent sends separate bids to 
these auctioneers, where each bid is for a single type 
of resource at that host. The bids are continuous 
bids in that they stay in effect until the user's lo- 
cal balance is depeleted. Resources are allocated to 
users in proportion to their bids using a best-effort 
model. Agents are responsible for optimizing their 
users' utility. 

Using our prototype implementation, we show: 1) 
continuous bids reduce the burden on users by allow- 
ing them to run without frequent interactive bidding 
while still making an efficient and low-latency allo- 
cation; 2) distributed auctioneers result in very low 
overhead for allocation; and 3) the best-response al- 
gorithm can optimize across multiple markets. 
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